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ABSTRACT 

The Stochastic or Code- Excited Linear 
Predictive Coder (CELP) is among the promising 
candidates for producing good quality speech at low 
bit rates. However, the speech quality produced 
suffers from perceived roughness. Many researchers 
have used pole-zero postfilters to mask the 
roughness at the output of the synthesis filter. 
Although the postfilters are effective in masking 
the noise at low bit rates, they produce spectral 
distortions. In this paper 1 , it is proposed to 
improve the speech quality by introducing two 
modifications to the fixed stochastic codebook. In 
the first modification, the stochastic codebook is 
used only when the long-term corrrelations are low. 
Otherwise a pulse-like codebook is selected. In the 
second modification, the selected codcbook output 
is weighted using an adaptive spectral shaping 
procedure. These two modifications have been 
incorporated in a 4 800 bps CELP coder and have 
resulted in a perceptually improved vocoder! speech. 

1. INTRODUCTION 

Code-Excited Lineai Predictive Coding (CELP) 
was first introduced by Atal and Schrocder in 1984 
[1]. This algorithm represented a breakthrough for 
achieving good quality speech at rates below 4800 
bps. Its major drawback was its computational 
complexity, which was prohibitive for real-time 
applications. Since then, many speech researchers, 
recognizing the tremendous potential of this 
algorithm, experimented with different methods 
for simplifying the algorithm. At the same time, 
the DSP chips became more powerful and floating 
point processors were introduced. These activities 
have resulted in the implementation of CELP (or 

1 This project was sponsored by 
DCEM/DRDCS, code number 0417U. 


similar algorithms developed later like Vector 
Adaptive Predictive Coders (VAPC) [2] oi Vecl.oi 
Sum Excited Linear Predictive Codei (VSELP) [3]) 
using a single DSP chip. In this paper, a biief 
description of a reference CELP codei is described, 
followed by two modifications proposed for the 
stochastic coclebook. The modified CELP algorithm 
has been implemented on a single TMS320C25 
DSP chip and operates in lull-duplex at the rate of 
4800 bps. 

2. REFERENCE CODER DESCRIPTION 

The reference coder’s synthesizer is depicted in 
Fig. 1. The excitation to the synthesizer e(n) given 
by (1) is the linear combination of a vector born 
the stochastic codebook x(K-fn) and a vector fiom 
the adaptive codebook e(n-L) 

e( n ) = Gx(K + n) 4- /?e(n-L). n = 0,~)9 (1) 

where I\ and L are the optimal indices of the 
stochastic and adaptive codebooks i expect i \ cly , and 
G and (3 the gains of the \ectoi.s fiom the 
respective codcbooks. During voiced sounds, the lag 
coefficient f3 is close to unity. In this case most ol 
the contribution to the excitation comes from the 
adaptive codebook (which represents the excitation 
history) and the stochastic codebook entry acts 
only as a correcting term. For lower bit rates the 
stochastic codebook contribution may cause noisy 
synthetic speech for two reasons. First, the LPG 
model with a limited number of coefficients (10) 
fails to entirely remove all the short-term 
correlations from the speech input signal. The 
result is an intelligible residual with a non-flat 
spectrum. This may suggest that the stochastic 
codebook may be inadequate, particularly foi a 
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small codebook size (60 entries in our 

implementation) giving rise to rough synthetic 
speech. The second reason for the synthetic speech 
roughness may be attributed to the use of the 
stochastic codebook during voiced frames. In order 
to mask the noise, the reference coder uses a 
formant and lag postfilters [7] at the output of the 
synthesizer, in order to compensate for the two 
defficiencies listed above. The postfilters were 
reported to be effective in masking the noise but 
have the disadvantage of creating spectral 

distortions, particularly when the vocoders are used 
in tandem. In order to improve the synthesized 
speech quality without introducing the inherent 
spectral distortions produced by the postfilters, two 
modifications to the stochastic codebook are 
introduced, namely the use of a pulse-like codebook 
instead of the stochastic codebook during segments 
with long-term correlation coefficients larger than a 
certain threshold, and the spectral weighting of the 
output of the codebooks. These two modifications 
are described in the following sections, 

3. VOICED/UNVOICED CODEDOOKS 

It was observed that the synthetic speech 
produced by Pitch-Excited LPC coders sounds 
smoother (although less natural) than the reference 
coder. This may be attributed to the inadequacy of 
the stochastic codebook during voiced sounds. In 
order to smooth the CELP synthetic speech, we 
tried to imitate the excitation model of the Pitch- 
Excited coder. Thus, instead of using the stochastic 
codebook during voiced subframes, a pulse-like 
codebook is used. The number of pulses in the 
codebook is a function of the lag (which has been 
determined earlier in the analysis process) and the 
subframe size. The number of the codebook indices 
is chosen to be equal to the subframe size. For lag 
values larger than the subframe size, each codebook 
vector contains only one non-zero pulse positioned 
at a location equal to the codebook index. For lag 
values less than the subframe size, the first S-L-fl 
vectors, where S is the subframe size and L is the 
lag, contain two pulses each (with equal 
amplitudes) separated by the lag value and the 
location of the first pulse is equal to the codebook 
index. The remaining vectors contain only one non- 
zero pulse. The dual codebook is shown in Fig. 2, 
where if \/3\ is larger than a certain threshold T, 
(T>0), the speech is considered voiced, and 
consequently the pulse-like codebook is chosen. It is 
to be noted that the threshold is valid for values of 
\P\ close to unity, where speech is voiced, as well as 
large values of |/?| which indicates voicing onset. 


Otherwise, the stochastic codebook is used. The 
output of either codebook is frequency weighted as 
described in the following section. 

4. WEIGHTING OF THE DUAL CODEBOOK 

As mentioned earlier, the residual signal 
exhibits some short-term correlations which have 
not been successfully removed by LPC analysis. 
Intuitively, it makes sense to spectrally weight the 
dual codebook adaptively so that its spectrum will 
be as close as possible to that of the residual (after 
removing the long-term correlations). Several 
researchers have tried to apply spectral shaping to 
the excitation. In [4], an all-pole spectial shaping 
was applied to the excitation of a Pitch- Excited 
Linear Predictive Coder. In [5] a pole-zeio 
weighting filter was used to shape the excitation ol 
a CELP coder. The result was a perceived quality 
comparable to that obtained with the postfiltei 
without the inherent distortion introduced In the 
latter. We have chosen a filter similar to [4], The 
dual codebook is adaptively weighted every frame 
by 

W(z) = J (2) 

1+F^a,z ' 

l — 1 

where the a x ’s are the LPC filter coefficients and F 
is a modulus reduction factor given by 

F = off ( 1 ' k, 2 ). (X=o <= 1 (3) 

i=\ 

111 (3), the k t ’s are the reflection coefficients and rt 
is a scaling factor. When the LPC picdiction is 
efficient as is the case for front vowels, nuiriiiiih 
and nasals, the residual has a relatively flat 
spectrum and F is small. In this case W(z) acts ns 
an a 11- pass filter and the dual cod ebook is 
minimally weighted. For speech sounds where the 
LPC prediction is not as good, F is relatively huge 
and W(z) gives the dual codebook a spectral shape 
similar to that of the residual. 

4. EXPERIMENTAL RESULTS 

The two modifications described above have 
been implemented in the TMS320C25 code. The 
objective measure chosen to quantify the results is 
the segmental Signal to Weighted Noise ratio 
(SWNR) defined as 
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SWNR = 


N-l . 

E * 2 (") 

n~ Q 
N-l 9 

E e„ 2 (n) 

n=0 


adaptive fieciuencv weighting of the stochastic 
codebook and the use of a dual codebook. Thi.s has 
resulted in an improved speech quality ol the 
vocoder. Although these modifications resulted in 
(4) an increase of up to 10 dB, the improvements seem 
to be localized and were not translated into a 
significant increase in the average segmental 
SWNR. 


where s(n) is the original input speech signal, N is 
the frame size in samples, and e w (n) is the 
weighted error signal. The latter is obtained by 
frequency weighting the difference between the 
input and synthesized speech signals, by the 
conventional CELP weighting filter CW(z) defined 
by 

10 

i -f E a t z 

CW(z) = ^ , 7 = 0.75 (5) 

1 + EVa.z ' 

7 = 1 


Informal listening tests using the real-time 
hardware were performed on a variety of input 
speech samples. Both of the modifications 
introduced above produced synthetic speech which 
may be described as cleaner and fuller than that 
produced by the reference coder. However, this 
improvement was not translated into a significant 
improvement in the average segmental SWNR. 
Over a limited database of 999 frames, the 
improvement was only 0.27 dB in the case of the 
weighted codebook and 0.24 dB in the case of the 
dual codebook. However, the improvement seems to 
be localized and went as high as 5 dB in the case of 
the dual codebook and 7 dB in the case of the 
weighted codebook. The combined use of the two 
modifications resulted in increases as high as 10 dB 
in some frames. In Fig. 3a, 3 seconds of input 
speech from a male speaker are shown and the 
segmental SWNR of the corresponding synthesized 
speech is shown in Fig. 3b. The difference in 
segmental SWNR between the synthesized speech 
for each modification and the reference signal are 
shown in Figs. 3c and 3d. In can be noted from 
Fig. 3d that the largest increases in SWNR between 
the coder using the dual codebook and the reference 
coder tend to occur at the beginning of transitions. 

5. CONCLUSIONS 

In this paper, two modifications to the 
stochastic codebook were introduced, namely the 
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